376 results found.
Speech/Written
Corpus,
Language Type:
Bilingual
Languages:
English Spanish
Availability:
Freely Available
License:
CreativeCommons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Size:
1.7 GByte Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Prosodic Phrase Alignment for Machine Dubbing
-
Paper track:12.19 Other topics in Spoken Language Processing: /Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alp Öktem | Heroes Corpus | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Basque Belgian Dutch Croatian Czech Galician Greek Hungarian Portuguese Slovak Slovenian Spanish
Availability:
From Owner
License:
Size:
None Production Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs
-
Paper track:5.4 Speech and audio segmentation/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Lukas Mateju | COST278 database | /N |
Documentation:
None
Speech
Software Toolkit,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Freely Available
License:
MIT
Size:
None Production Status:
Newly created-in progress
Use:
phonological classification
-
Paper title:Phonet: a Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech
-
Paper track:3.5 Pathological speech and language/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Juan Camilo Vásquez Correa | phonet | /N |
Documentation:
https://phonet.readthedocs.io/en/latest/
Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English German Spanish
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Named Entity Recognition
-
Paper title:Sources of Transfer in Multilingual Named Entity Recognition
-
Paper track:Long/Information Extraction
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | David Mueller | CoNLL 2003 NER shared task corpus | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Cantonese English French German Gishu Greek Gujarati Hebrew Hindi Indonesian Japanese Korean Mandarin Persian Portuguese Runyankore Russian Spanish Turkish Vietnamese
Availability:
Freely Available
License:
OpenSource
Size:
22.8 GByte Production Status:
Newly created-in progress
Use:
Speech Recognition/Understanding
-
Paper title:Speaking rate, information density, and information rate in first-language and second-language speech
-
Paper track:1.10 Bilingual and L2 acquisition and processing/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ann Bradlow | The ALLSSTAR Corpus | /N |
Documentation:
Documentation in English is available to the public (via the project website)
Written
Corpus,
Language Type:
Multilingual
Languages:
English German Spanish
Availability:
From Owner
License:
Size:
10000 Onion Addresses OtherProduction Status:
Newly created-finished
Use:
Document Classification, Text categorisation
-
Paper title:The Language of Legal and Illegal Activity on the Darknet
-
Paper track:Long/Applications
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Daniel Hershcovich | DUTA-10K | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
English French German Italian Spanish
Availability:
Freely Available
License:
Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Size:
~1000000 sentences Production Status:
Newly created-finished
Use:
Word Sense Disambiguation
-
Paper title:Just "OneSeC" for Producing Multilingual Sense-Annotated Data
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Tommaso Pasini | OneSeC | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
Dutch Spanish
Availability:
License:
Size:
None Production Status:
Existing-used
Use:
-
Paper title:Multi-Source Cross-Lingual Model Transfer: Learning What to Share
-
Paper track:Long/Multilinguality
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Xilun Chen | CoNLL 2002 shared task (NER) data | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
1173 sentences Production Status:
Existing-used
Use:
Knowledge Discovery/Representation
-
Paper title:AutoML Strategy Based on Grammatical Evolution: A Case Study about Knowledge Discovery from Text
-
Paper track:Long/Information Extraction and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Suilan Estevez-Velarde | eHealth-KD Corpus | /N |
Documentation:
Documentation available in the resource URL
Written
Corpus,
Language Type:
Bilingual
Languages:
Dutch Spanish
Availability:
From Data Center(s)
License:
Size:
None Production Status:
Existing-used
Use:
-
Paper title:Neural Architectures for Nested NER through Linearization
-
Paper track:Short/Tagging, Chunking, Syntax and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jana Straková | CoNLL 2002 Shared Task Named Entity Data | /N |
Documentation:
None




